non-smooth optimization
An Optimal Structured Zeroth-order Algorithm for Non-smooth Optimization
Finite-difference methods are a class of algorithms designed to solve black-box optimization problems by approximating a gradient of the target function on a set of directions. In black-box optimization, the non-smooth setting is particularly relevant since, in practice, differentiability and smoothness assumptions cannot be verified. To cope with nonsmoothness, several authors use a smooth approximation of the target function and show that finite difference methods approximate its gradient. Recently, it has been proved that imposing a structure in the directions allows improving performance. However, only the smooth setting was considered.
An Optimal Structured Zeroth-order Algorithm for Non-smooth Optimization
Finite-difference methods are a class of algorithms designed to solve black-box optimization problems by approximating a gradient of the target function on a set of directions. In black-box optimization, the non-smooth setting is particularly relevant since, in practice, differentiability and smoothness assumptions cannot be verified. To cope with nonsmoothness, several authors use a smooth approximation of the target function and show that finite difference methods approximate its gradient. Recently, it has been proved that imposing a structure in the directions allows improving performance. However, only the smooth setting was considered.
Distributed stochastic proximal algorithm with random reshuffling for non-smooth finite-sum optimization
Jiang, Xia, Zeng, Xianlin, Sun, Jian, Chen, Jie, Xie, Lihua
The non-smooth finite-sum minimization is a fundamental problem in machine learning. This paper develops a distributed stochastic proximal-gradient algorithm with random reshuffling to solve the finite-sum minimization over time-varying multi-agent networks. The objective function is a sum of differentiable convex functions and non-smooth regularization. Each agent in the network updates local variables with a constant step-size by local information and cooperates to seek an optimal solution. We prove that local variable estimates generated by the proposed algorithm achieve consensus and are attracted to a neighborhood of the optimal solution in expectation with an $\mathcal{O}(\frac{1}{T}+\frac{1}{\sqrt{T}})$ convergence rate, where $T$ is the total number of iterations. Finally, some comparative simulations are provided to verify the convergence performance of the proposed algorithm.
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Chongqing Province > Chongqing (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (3 more...)
A Richer Theory of Convex Constrained Optimization with Reduced Projections and Improved Rates
Yang, Tianbao, Lin, Qihang, Zhang, Lijun
This paper focuses on convex constrained optimization problems, where the solution is subject to a convex inequality constraint. In particular, we aim at challenging problems for which both projection into the constrained domain and a linear optimization under the inequality constraint are time-consuming, which render both projected gradient methods and conditional gradient methods (a.k.a. the Frank-Wolfe algorithm) expensive. In this paper, we develop projection reduced optimization algorithms for both smooth and non-smooth optimization with improved convergence rates under a certain regularity condition of the constraint function. We first present a general theory of optimization with only one projection. Its application to smooth optimization with only one projection yields $O(1/\epsilon)$ iteration complexity, which improves over the $O(1/\epsilon^2)$ iteration complexity established before for non-smooth optimization and can be further reduced under strong convexity. Then we introduce a local error bound condition and develop faster algorithms for non-strongly convex optimization at the price of a logarithmic number of projections. In particular, we achieve an iteration complexity of $\widetilde O(1/\epsilon^{2(1-\theta)})$ for non-smooth optimization and $\widetilde O(1/\epsilon^{1-\theta})$ for smooth optimization, where $\theta\in(0,1]$ appearing the local error bound condition characterizes the functional local growth rate around the optimal solutions. Novel applications in solving the constrained $\ell_1$ minimization problem and a positive semi-definite constrained distance metric learning problem demonstrate that the proposed algorithms achieve significant speed-up compared with previous algorithms.
- North America > United States > Iowa > Johnson County > Iowa City (0.14)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (2 more...)
Inexact Proximal Gradient Methods for Non-convex and Non-smooth Optimization
Gu, Bin, Huo, Zhouyuan, Huang, Heng
Non-convex and non-smooth optimization plays an important role in machine learning. Proximal gradient method is one of the most important methods for solving the nonconvex and non-smooth problems, where a proximal operator need to be solved exactly for each step. However, in a lot of problems the proximal operator does not have an analytic solution, or is expensive to obtain an exact solution. In this paper, we propose inexact proximal gradient methods (not only a basic inexact proximal gradient method (IPG), but also a Nesterov's accelerated inexact proximal gradient method (AIPG)) for non-convex and non-smooth optimization, which tolerate an error in the calculation of the proximal operator. Theoretical analysis shows that IPG and AIPG have the same convergence rates as in the error-free case, provided that the errors decrease at appropriate rates. Keywords: Non-convex optimization, non-smooth optimization, proximal gradient, inexact proximal operator, Nesterov's accelerated method
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required non-trivial smoothness assumptions, which do not apply to many modern applications of SGD with non-smooth objective functions such as support vector machines. In this paper, we investigate the performance of SGD without such smoothness assumptions, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy. In this framework, we prove that after T rounds, the suboptimality of the last SGD iterate scales as O(log(T)/\sqrt{T}) for non-smooth convex objective functions, and O(log(T)/T) in the non-smooth strongly convex case. To the best of our knowledge, these are the first bounds of this kind, and almost match the minimax-optimal rates obtainable by appropriate averaging schemes. We also propose a new and simple averaging scheme, which not only attains optimal rates, but can also be easily computed on-the-fly (in contrast, the suffix averaging scheme proposed in Rakhlin et al. (2011) is not as simple to implement). Finally, we provide some experimental illustrations.
- North America > United States > Washington > King County > Redmond (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)